Abstract

Middlebury College’s Human Geography with GIS course (GEOG 0261) regularly conducts an analysis on “Flood Hazard Vulnerability in Vermont’s Mobile Homes” using QGIS; the GEOG 0261 analysis builds on Baker et al.’s 2011 study on Rapid Flood Exposure Assessment of Vermont Mobile Home Parks Following Tropical Storm Irene (see bottom of this report for formal citation).

In this report, I conduct a reproduction study of the analysis conducted in GEOG 0261, however I use a code-based approach to spatial analysis using R instead of QGIS. The motivation for this study is to see whether a basic spatial analysis assignment geared towards beginner geography students can be reproduced using a code-based approach. Additionally, I seek to improve internal validity to the study by reducing the impact of a boundary distortion along the Connecticut River. See below for background on the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment that students in GEOG 0261 are assigned:

“Accurate assessment of risk is an essential for effective response to any natural disaster. The methodologies used to assess risk can end up underestimating vulnerabilities. Tropical Storm Irene offers an example of inadequate assessment of risk, which then leads to inadequate planning for and response to a disaster. The storm inundated Vermont with unprecedented rainfall on August 28 and 29 of 2011. The storm destroyed 480 bridges and 960 culverts (where streams cross under a road), causing $350 million in road damage and cutting off road access to 13 mountain communities. Even Vermont’s emergency management offices were flooded! Some of the most affected people were living in mobile homes, whether on individual parcels of land or in mobile home parks. At least 130 mobile homes were destroyed and an additional 300 severely damaged (Figure 1). Our problem will evaluate assessments of flooding risks with a focus on mobile homes in Vermont. There are two different ways of assessing flooding risk in Vermont: one is by the federal agency, FEMA (The Federal Emergency Management Association), and one by a state agency, Vermont Rivers Program. The federal agency, FEMA, estimates flood risk in terms of inundation from rising water levels in stable river channels. Based on existing channels, FEMA hydrologists estimate the region of land that would be potentially flooded by a 1% (100-year) flood. The residents with mortgages in that region are required to purchase flood insurance. The state of Vermont’s River Corridors Program estimates flooding risk differently, using river corridors. After Irene, the state of Vermont recognized that the most damaging flooding in Vermont is not due to inundation but rather due to fluvial erosion: the erosion of riverbanks as the river channel widens or migrates to form new channels (Figure 1 and Figure 2). By this estimation, regions where rivers may erode and migrate to in the future are also at risk of flooding.”

Materials and procedure

Computational environment

Set up file path shortcuts

Data and variables

There are five layers for this analysis, each coming from a different primary source. Primary data sources for the study are to include …

  1. e911pts.shp - point - epsg: 32145 e911 point location data for all residences and buildings in Vermont, for use with emergency response. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/
  • SITETYPE: type of building structure/use case of the structure. “MOBILEHOME” is the SITETYPE that indicates a site is a mobile home/
  1. FEMA_100yr.shp - polygon - epsg: 32145 FEMA Flood Zone polygons with codes. Codes starting with “A” indicate a 100-year flood risk zone. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/
  • FLD_Zone: contains FEMA Flood Zone Codes. If a polygon has a code begins with an “A”, then that polygon indicates a 100-year flood zone.
  1. river_corridors.shp - polygon - epsg: 32145 Vermont river corridor polygons, as defined by Flood Ready Vermont. This flood hazard approach includes streams (with a 50 foot buffer) and rivers with watersheds more than 2km. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/

  2. block_groups.shp - polygon - epsg: 32145 Census block group polygons in southern Vermont, with data on housing. The data file was acquired from the US Census ACS Survey 2014-2018

  • mobileHU: estimated total number of mobile home housing units within the block group
  • totalHU: estimated total number of all housing units within the block group
  • county: name of county in which the block_group is located
  1. towns.shp - polygon - epsg:32145 Downloaded from the US Census Bureau.
  • townName: name of the town

You can find the full metadata for each of these variables in the data/metadata section of the repository

Prior observations

I - the author of this reproduction study - have spent the past 3.5 years living as a student in Vermont. I am familiar with the flood risk faced in Vermont, the geography of the state, and how the state makes data publicly accessible. Thus, I have prior experience with the entirety of this study, although this is not a concern given that no statistical tests are conducted and no models are built. The goal of this study is merely to reproduce a study that is usually conducted in QGIS but in R.

I was also a student in GEOG 0261 (formerly GEOG 0120) and I conducted this study in January, 2022 in QGIS.

Bias and threats to validity

Going into this study, I know that there is a boundary distortion along the eastern edge of the state along the Connecticut River error that compromises internal validity. Vermont River Corridors (the shapefile) does not include a river corridor model for the Connecticut River. I will attempt to estimate my own river corridor for the Connecticut River.

Also, I know going into this study that there are issues with small numbers of mobile homes in some towns that are used as denominators in calculating percentages, and this will lead to overly sensitive and overly inflated percentages in some towns. I do not plan to change this, as that would require calculating the area of towns and counties, which I did not have time for when completing this study.

Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.

Lastly, the original QGIS study utilizes and area weighted reaggration for determining a number of mobile homes at risk in the FEMA flood zones, based on ACS 2014-2018 survey data and assuming an even distribution of mobile homes across counties. However, the GEOG 0261 course has repeatedly demonstrated that this is an inaccurate approach to the research question, and thus I will not try to reproduce this part of the study. This is an issue of a modifiable areal unit problem.

Data transformations

Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study’s spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.

Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.

Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.

Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.

Read in the layers

Exploratory Data Analysis

Visualize the FEMA flood zones and river corridors

## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\FEMA_flood_zone_map.pdf
## Size: 6.25 by 7.819444 inches
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\river_corridor_map.pdf
## Size: 6.25 by 7.819444 inches

Clean up the flood zone and river corridor layers into more usable formats

Clean up the FEMA data to identify all the zones that start with “A” (aka in the 100 year flood zone)…treat them all the same

## 
##    A   AE   AO 
##  567 1832    2
## Simple feature collection with 2401 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 424802.5 ymin: 25228.79 xmax: 523718.9 ymax: 158609.9
## Projected CRS: NAD83 / Vermont
## # A tibble: 2,401 × 3
##    FLD_ZONE                                                       geometry flood
##  * <chr>                                                <MULTIPOLYGON [m]> <lgl>
##  1 AE       (((446276.7 124090.2, 446286.8 124096.6, 446295.7 124103.9, 4… TRUE 
##  2 AE       (((446494.3 124089.5, 446493.5 124098, 446491.4 124108.5, 446… TRUE 
##  3 AE       (((444273.9 123609.2, 444286.7 123605.9, 444296.5 123602.8, 4… TRUE 
##  4 AE       (((442981.8 122314.2, 442982.7 122310.9, 442986.9 122303.6, 4… TRUE 
##  5 AE       (((448164.3 123952.9, 448168.3 123952.4, 448171.6 123951.9, 4… TRUE 
##  6 AE       (((444449.3 123728.5, 444452.7 123721.8, 444458.8 123713.4, 4… TRUE 
##  7 A        (((453734.8 130105.7, 453734.8 130137.5, 453734.8 130172.4, 4… TRUE 
##  8 AE       (((443409.4 123216.8, 443410.1 123219.9, 443412.3 123223.9, 4… TRUE 
##  9 AE       (((448766.4 123935.4, 448768.5 123934.6, 448772.2 123933.4, 4… TRUE 
## 10 AE       (((448860.4 123953.6, 448859.9 123957.1, 448854 123964.4, 448… TRUE 
## # ℹ 2,391 more rows

Group by flood (all of the 100 year flood zones) and dissolve the geometry to get a single multipart flood zone

And do the same thing for river corridors…create a single multipart river corridor

Calculating table columns to reproduce Table 1

Column 1 (step 1): calculate the total number of mobile homes in each county from the ACS data - aggregate from the block groups

## # A tibble: 4 × 2
##   county     number_of_MHs
## * <chr>              <dbl>
## 1 Bennington          1277
## 2 Rutland             1992
## 3 Windham             1833
## 4 Windsor             2427

Columns 3 and 4 (steps 3 and 4): mobile homes at risk

I exclude column 2, which was created using area weighted aggregation of mobile homes.However, the GEOG 0261 instructors discovered that this approach is less accurate than using the e911 point data to identify mobile homes at risk, so I will forego reproducing the AWR approach.

Isolate only the points/residences that are mobile homes from the e911 data

Add county and town variable (from the census/ACS data) to the mobile homes points, joining by location

NOTE: I attempted also to sum the number of e911 mobile home points for each county. This yielded a different number of mobile homes than Column 1 indicates. Because the ACS measurement of mobileHU is a survey-based estimate, it does not represent the true number of mobile home structures. This is a source of geographic uncertainty to this analysis, specifically an issue of spatial heterogeneity and construct validity.

I will proceed with using Total Number of Mobile Homes column from the ACS data to be consistent with the GEOG 0261 analysis, but this is something that may want to be changed in the future.

Add buffer (18.3 meters aka 60 feet) to mobile home points to account for their structure sizes

Identify the mobile homes within FEMA flood zone, and group by county to get total number of MHs in the flood zone by county

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
## # A tibble: 4 × 2
##   county     mobile_home_count
## * <chr>                  <int>
## 1 Bennington               189
## 2 Rutland                  164
## 3 Windham                  299
## 4 Windsor                  198

Now do the same thing but for river corridors…

Identify the mobile homes within river corridors, and group by county to get total number of MHs in the river corridors by county

## # A tibble: 4 × 2
##   county     mobile_home_count
## * <chr>                  <int>
## 1 Bennington               130
## 2 Rutland                  204
## 3 Windham                  298
## 4 Windsor                  353

Join all columns together to get the final table! Table 1:

## # A tibble: 4 × 6
##   county number_of_MHs MHs_at_risk_FEMA MHs_at_risk_River_Co…¹ FEMA_rate RC_rate
##   <chr>          <dbl>            <int>                  <int>     <dbl>   <dbl>
## 1 Benni…          1277              189                    130    0.148    0.102
## 2 Rutla…          1992              164                    204    0.0823   0.102
## 3 Windh…          1833              299                    298    0.163    0.163
## 4 Winds…          2427              198                    353    0.0816   0.145
## # ℹ abbreviated name: ¹​MHs_at_risk_River_Corridors

Unplanned deviation for reproduction: I decided to calculate “rates”risk rates” to indicate what proportion of a county’s mobile homes lie within the FEMA flood zones and the River Corridors, respectively. They are pretty similar for Windham County. In Bennington County, the FEMA risk rate is higher than the River Corridor risk rate. In Windsor County and Rutland County, the River Corridor risk rate is higher than the FEMA risk rate.

Visualize MH flood risk by town:

First, I need to find the number of mobile homes by town

Next, I need to find which mobile homes are in EITHER a FEMA flood zone OR River Corridor

This will cast a wider net than if we look at just flood zones or river corridors individually, as this will maximize the number of mobile homes that are determined to be at risk. At this stage in the analysis, we care more about seeing which towns have the highest vulnerability of mobile homes to flooding, not whether the VT River Corridor or FEMA Flood Zone approach is more accurate. Thus, including both flood risk identification metrics is a safer approach to ensure we identify all mobile homes that are at some level of risk to flooding.

Create a table to show the 10 towns with the highest mobile home flooding risk (Table 2)

## Simple feature collection with 10 features and 4 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 437815 ymin: 36309.08 xmax: 509676.9 ymax: 148890.4
## Projected CRS: NAD83 / Vermont
## # A tibble: 10 × 5
##    town       mobile_home_count                           geometry at_risk_count
##    <chr>                  <int>                      <POLYGON [m]>         <dbl>
##  1 Woodford                  27 ((446813.5 41030.5, 446912.2 4277…            23
##  2 Woodstock                 70 ((488855 117635, 493730 128803.1,…            48
##  3 Sandgate                   7 ((438054.6 74666.82, 438157.2 768…             4
##  4 Jamaica                  100 ((469989.1 68179.99, 470183.9 717…            49
##  5 Windsor                   63 ((504000.3 113907.5, 507400.5 113…            30
##  6 Killington                13 ((470120.8 124335.6, 471672.5 128…             6
##  7 Pittsfield                 9 ((469974.3 148890.4, 472846.2 147…             4
##  8 Plymouth                  18 ((476628.5 111137.1, 477790.3 119…             8
##  9 Wilmington                94 ((465280.3 39651.14, 465348 41403…            38
## 10 Proctor                   15 ((454734.2 131983.4, 455376.5 132…             6
## # ℹ 1 more variable: pct_mh_at_risk <dbl>

Results slightly differ between this ant the GEOG 0261 results because I directly downloaded the towns.shp file from the VT Open GeoData Portal, while the GEOG 0261 class uses a pre-cleaned provided towns shapefile layer. I could not use the provided one due to file corruption issues. However, the provided one for the class distinguishes between Rutland Town and Rutland City, while the layer that I downloaded treated the two as a combined town of “Rutland.” Rutland Town has a pct_mh_at_risk value of 57.89, so if my towns file distinguished just the town portion, it would be in the top 10 of highest risk towns for mobile homes at risk.

Plot a choropleth map of the % of mobile homes in each town that are at risk to flooding, and export (Map 1)

## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\pct_mh_at_risk_by_town.pdf
## Size: 9.125 by 5.361111 inches

Unplanned deviation for reproduction: identify areas where FEMA 100 year Flood Zones and VT River Corridors do not line up

I was curious about the discrepancy between the two metrics so created a layer that differences the two polygon layers and plots it with a satellite base-map. Notably, the River Corridors include fewer lakes/ponds and large rivers. This is significant, because these water bodies can still cause severe flooding if water influx causes them to over spill their banks. Additionally, note that the Connecticut River is not included at all in the River Corridors.

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
## tmap mode set to interactive viewing

Planned deviation for reproduction: Identify how many mobile homes would be likely to be in a River Corridor of the Connecticut River, if the creators of the River Corridor shapefile had included the Connecticut River.

This will help improve the geographic uncertainty caused by the boundary effect of the Connecticut River technically being fully part of New Hampshire, not Vermont.

## # A tibble: 2 × 2
##   county  mobile_home_count
## * <chr>               <int>
## 1 Windham                61
## 2 Windsor               283

While this is definitely a bit of an overestimate, it does highlight how the VT River Corridor model’s omission of the Connecticut River excludes a significant number of mobile homes. There are a large number of mobile home parks close to the Connecticut River, which is a contributing factor. 61 mobile homes in Windsor at risk to flooding and 283 mobile homes in Windsor are at risk to flooding, aka they intersect my estimated river corridor for the Connecticut River.

To estimate a Connecticut River Corridor polygon layer (ct_river_corridor), I performed the following in QGIS: - Load in a line layer for the centerline of the river channel, which I downloaded from the VT Open GeoData Portal - I eyeballed the average width of the river channel by taking measurements with the measurement tool at various points along the river, and I made a well-informed assumption that the average channel width is 180 meters wide (this is probably a bit of an overestimate). I then buffered the centerline by 90 meters to get a polygon that represents the river channel that is consistently 180 meters (buffered 90 meters from the centerline on either side).
- I then added an additional 580 meter buffer. This creates a polygon representing the river corridor that is consistently 1080 meters wide, or 6x the 180 meter average channel width

The documentation for the report on how River Corridors were constructed, on the Flood Ready Vermont website, said that a good rule of thumb is that a river corridor is 6x the width of the stream/river channel that it is based around. While they do also take slope and geologic traits to provide a more specific river corridor profile, that is beyond the scope of this analysis, and I will make do with the 6x width approach.

Check out the polygon that I made to estimate the River Corridor for the Connecticut River

## tmap mode set to interactive viewing

Results

Table 1: Mobile Homes at Risk by County (FEMA vs. River Corridors)

Ultimately, an R-based code approach to this spatial analysis study yields the exact same results that are found by Middlebury’s GEOG 0261 class using the QGIS GUI. I find that the number of total mobile homes (ACS data), mobile homes at risk (e911 and FEMA data), and mobile homes at risk (e911 and river corridor data) for Bennington, Rutland, Windham, and Windsor Counties are exactly the same. In Bennington County, the FEMA approach identifies significantly more mobile homes to be at risk than the River Corridor approach does. In Windsor County, the two approaches predict about the same. In Rutland and Windsor Counties, the River Corridor approach identifies significantly more mobile homes to be at risk than the FEMA approach. These findings were true for both total number of mobile homes at risk and proportion of mobile homes in the county that are at risk.

Map 1 and Table 2: Percentage of Towns at Risk to Flooding (Either FEMA or River Corridor), by Town

A slight variation, as discussed earlier in the report, arises from a difference in the town shapefile layer. The town layer that was provided to the GEOG 0261 class distinguishes between Rutland Town and Rutland City, while the town layer that I downloaded from the American Community Survey (since I was having corrupt file issues with the towns.shp used by the GEOG 0261 class) just shows Rutland as a single town.It is possible that the course instructors and I both started with the same town shapefile, but the instructors could have combined the two towns into one during their data processing. Another reason could be that between the time the instructors downloaded the town shapefile and the time I downloaded the file, the town borders could have changed (ie. Rutland was broked into Rutland City and Rutland Town). It is difficult to know exactly without knowing when the instructors downloaded their data and what they did to process it.

Reproducing the town-based risk map yielded the same exact results as the GEOG 0261 QGIS approach, except for Rutland. When sorting towns based on highest percentage of their mobile homes at risk in the QGIS approach (Table 2), Rutland Town had the third highest % of mobile homes at risk out of all of the towns in the four counties. However, when considering Rutland as a single entity in my R reproduction, it does make the top 10 list. Instead, all the towns get shifted up one place, with Proctor, VT being the town with the 10th highest risk. There are some threats to validity here, namely the small number problem, which I will addreess in the Discussion section.

Woodford and Woodstock have the first and second highest % of mobile homes at risk in both R and QGIS, with 85.18% and 68.57% respectively of their mobile homes at some risk to flooding, as determined by FEMA or River Corridors.

R code vs. QGIS GUI:

Ultimately, the main goal of this reproduction was to find a way using R code to do all the spatial analysis steps that are taught to GEOG 0261 students in QGIS in the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment for the course. I followed the exact same workflow provided by the instructors of GEOG 0261 to create Table 1. While I technically could have done a select by location to identify the mobile homes in the flood zones and river corridors, I chose to use a join by location and then a filter because of a warning that was thrown and because it was computationally faster.

I relied on the stars and tmap libraries to work with the spatial data.

Addition of the estimated Connecticut River corridor:

Based on my estimation of the river corridor for the Connecticut River, I found that 61 mobile homes in Windham and 283 in Windsor are missed when using the existing river corridor shapefile that is published on the VT Open GeoData Portal and is provided to the GEOG 0261 students. Although I likely overestimate the extent of the river corridor a bit (since I did not account for geologic conditions or topography/slope…I only considered meander belt width), this shows that VT River Corridors do miss out on identifying mobile homes that could be at risk. Fortunately, the FEMA Flood Zones do seem to contain most of the mobile homes along the Connecticut River that do appear to be at risk (see the presence of FEMA Flood Zones in the first map presented at the top of this report…the map of Flood Zones).

Discussion

Sources of uncertainty and geographic threats to validity:

There are several threats to validity/sources of uncertainty, as proposed by Schmitt (1978), that are inherent to “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment, as well as this reproduction study of it. Although this reproduction study partially addresses the boundary distortion problem posed by the lack of published River Corridors along the Connecticut River, this reproduction by no means fixes all the sources of uncertainty.

There are a few potential threats to construct validity that this reproduction study does not directly address.

Notably, Table 2 and Map 1 do illustrate how the study runs into a small number problem when calculating the the % of mobile homes at risk in each town. Because some towns have very few mobile homes to begin with (ex. Sandgate has 7, Pittsfield has 9), any rate or % calculation that uses number of mobile homes as its denominator is highly sensitive and distorted. Because Sandgate has 4 mobile homes at risk and because Pittsfield has 4 mobile homes at risk, their % of mobile homes at risk is incredibly high. Sandgate is over 50%! This makes it seem like they have a huge problem with their mobile homes being at risk. Sandgate has the 3rd highest at risk rate of any town, and Pittsfield has the 7th highest at risk rate of any town, despite the two of them only having 8 mobile homes at risk combined. Meanwhile, Pownall, which has 135 mobile homes at risk out of its 359 total homes at risk (37.6%), and Brattleboro, which has 127 mobile homes at risk out of its 401 total mobile homes (31.7%) don’t even crack the top 10 list, and the choropleth map makes their risk rates seem less severe. While the study make Map 1 based on total number of mobile homes at risk, and sort Table 2 based on total number of mobile homes at risk (instead using the rate), then we would run into the opposite problem where the largest towns with the most area would have the most mobile homes at risk…basically running into the Modifiable Areal Unit Problem. It might make more sense to divide by area of the town in future studies to normalize for size of the town, rather than dividing number of mobile homes at risk by total number of mobile homes. It needs a better way to normalize. Table 1 (which reports on counties, not towns) does not experience this problem as much because the number of mobile homes is over 1000 in each county, which is a large enough sample and denominator to keep any rate calculations from being distorted. These matters all highlight the importance of scale and areal unit, and are examples of partition distortions and scale distortions affecting internal validity of the study

While Table 1 is more resilient to distortions than Table 2, a concern that I discovered with Table 1 is that it uses ACS estimates to determine total number of mobile homes in each county, rather than spatially seeing which e911 mobile home points fall within each county and counting those. This is a problem or question of measurement and spatial heterogeneity, as the ACS surveys might not adequately represent or weight the population living in mobile homes in Vermont. There big differences between living in a mobile home park and an independent mobile home, and I have a feeling that the ACS surveys do not always take these distinctions into account in their surveys.

Another threat to validity of the original analysis was the boundary distortion caused along the Connecticut River for creating River Corridors. According to Flood Ready Vermont,

“During the initial development of the river corridor base map, the DEC recognized the Connecticut River flows in a unique geologic and geographic setting and is influenced by numerous impoundments. In order to create an appropriate River Corridor for the Vermont side of the Connecticut, the Rivers Program will conduct a separate analysis in 2019 to review the influence of features such as escarpments and impoundments that affect fluvial processes, valley bottom lands, floodplains, river planform, and corridor widths. For Vermont projects being reviewed while this River Corridor is being developed DEC will make site specific river corridor and floodway determinations in accordance with the Flood Hazard Area & River Corridor Protection Procedure. When the Connecticut River Corridor is developed it will be available for a public review and comment similar to the release of the Statewide River Corridor Base Map.” LINK

I believe that the Connecticut River also may not have been included because the border between VT and NH actually lies along the VT shoreline of the river, and not at the centerline. Hence, the Connecticut River is technically entirely outside of the state of Vermont.

However, there does not seem to be any evidence that the Connecticut River was ever included in 2019, upon searching in the VT Open GeoData Portal. As I discussed earlier in the report, I crudely estimated what river corridor for the Connecticut River might look like, and identified mobile homes that fell within this estimated corridor. This identification of mobile homes that would not have been identified by the River Corridor approach using published River Corridor data helps address the threat to internal validity caused by the boundary distortion on the eastern edge of the state where no River Corridors are defined for the Connecticut River. Additionally, since FEMA flood zones do contain most of the mobile homes along the Connecticut River, there is still some level of acknowledgement that these mobile homes are at risk to flooding. If the FEMA Flood Zones were similarly absent along the Connecticut River, this would be a different story.

For this study, I relied on the data that is provided by the instructors to the students of GEOG 0261, which is preprocessed by the instructors. The formal student instructions for the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” mention that the data is all downloaded from either the VT Open GeoData Portal or from the US Census/American Community Survey website, both of which are freely accessible to a public audience. I used this data because that is what the students receive in class, and I wanted to see if it was possible for a student to reproduce the analysis in the R coding environment. However, I found that the “towns.shp” file from when I was a student in GEOG 0261 (formerly GEOG 0120) was corrupted, and thus I re-downloaded the file from the ACS. However, the file that I downloaded did not distinguish between Rutland Town and Rutland City, as the pre-processed data did.

Using the pre-processed data provided by the instructors certainly makes this work less reproducible and introduces potential sources of uncertainty if one was to download the data from the source themselves and process the data themselves. It is challenging to offer more insight into this not knowing the exact files that the instructors downloaded, when they downloaded them, and what pre-processing steps they took.

I have provided in the data/raw/public folder unprocessed data layers that I downloaded from the VT Open GeoData Portal and ACS that I believe where what the instructors used. Users of this reproduction study and its repository can investigate these data sources, but I am not completely confident that all the right data layers are there. Use caution and use these files at your own risk.

The methodology of this study, which is largely adopted from that of Baker et al. (2014), seems to have strong external validity given the assumption that you have point locations for mobile homes (or other structures, if that is what you are investigating the flood risk of). However, since not every county in the US uses E911 (as of 2014 when the Baker et al. paper was published), this methodology might not be reproducible in those counties without point data on mobile homes. As the GEOG 0261 instructors found, using an area-weighted aggregation approach is certainly possibly for identifying mobile homes in flood hazard areas, but this would assume an even distribution of mobile homes within a county and is thus less accurate.

Because this methodology does not perform much statistical analysis, model building, or perform any statistical tests, there are not many threats to statistical or internal validity. The methodology is very straightforward, and relies mostly on simple spatial analysis techniques such as buffering, joining by location, and selecting by location.

Future additions to the study:

Future additions of the study could work to improve on the sources of uncertainty and threats to validity. Notably, a future study could count the number of mobile home points (from e911 data) in each county to use as the first column in Table 1 (and thus to use as a denominator in the rate columns). Currently, the table relies on ACS estimates of mobile homes to provide values for the mobile_home_count variable. However, the ACS survey is just a well-informed estimate, whereas e911 points are known locations of mobile homes.

If Flood Ready Vermont ever does create a river corridor for the Connecticut River, that would warrant further another reproduction attempt. Or, someone with strong geologic knowledge could try to better estimate a river corridor for the Connecticut River and use that. See the docs/presentation folder for two reports from the Vermont Agency of Natural Resources on how River Corridors are constructed. Or, view them online HERE and HERE.

Additionally, since it is unknown when the GEOG 0261 data was actually constructed, someone could do reproduce this study with all new downloaded data, especially using newer ACS data (newer than 2014-2018).

Pedagogical implications:

Using R to teach this analysis certainly has its advantages over QGIS. Because all of the R code is documented in a single file, all of the code can easily be run altogether and re-run as many times as necessary. This makes it especially to teach how reproduction studies work. Meanwhile, although QGIS is open source and easily acessible to students, QGIS involves clicking lots of buttons based on following a workflow, with lots of room to go wrong by clicking the wrong buttons, missing a step, etc. It is challenging to see where you went wrong in QGIS, and it is easier to make mistakes. Using R for basic spatial analysis like in this study is very straightforward, allows for various sections of the analysis to be run all at once, and to easily record/track how you created the maps, data frames, variables, and tables that you did. Performing this analysis in R yielded the same results (except for the Rutland issue…but that’s because I used a different towns layer) as QGIS, meaning that it is completely valid to teach students this study using R. It would not impact their results. This reproduction of the GEOG 0261 study provides strong evidence in support of teaching intro geography students spatial R. There is room for both tools in teaching GIS.

Policy implications:

Both the original study and this reproduction study demonstrate how across Vermont, mobile homes tend to be at a high risk of flooding. Given that residents of mobile homes tend to be of lower income and marginalized communities, this warrants targeted policy efforts (especially in the towns and counties that this study shows to be higher risk) to mitigate their vulnerability to flooding.

Connecticut River:

As I have implied already, given that many mobile homes along the Connecticut River would not be identified as at risk to flooding when using the published River Corridor approach, policymakers should ensure that these homes are not left out of any flood program, flood insurance, or other policy efforts designed to mitigate flood risk for vulnerable communities since they appear to be at an increased risk of being overlooked. However, fortunately FEMA does acknowledge most of these CT River mobile homes in their 100 year flood zones.

Conclusion

I successfully reproduced the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment from Middlebury College’s GEOG 0261 course (which is taught using QGIS) using an code-based approach in R. I found that the core spatial analysis concepts used are analogous between programs and that using R can be advantageous for minimizing mistakes in the analysis (and for more streamline, organized, and clear workflows).

Additionally, I find that the original assignment/analysis, which is modeled off of the methodology employed by Baker et al. (2014), utilized a Vermont River Corridor layer that does not exist along the eastern boundary of the state - the Connecticut River. Thus, a not-insignificant number of mobile homes in Windsor and Windham counties are not identified as at risk of flooding when using the Vermont River Corridor approach to determining flood vulnerability.

The original study, as well as this reproduction, indicate the high levels of vulnerability to flooding that mobile homes in Vermont face. This is especially relevant given the catastrophic flooding that Vermont experienced during the Summer of 2023, and warrants policy actions to protect those that are most vulnerable.

Lastly, as the capstone project to my GEOG 0361, this reproduction has been produced in a way such that it is intended to be reproducible and accessible. Open Science is the future of all science. I hope that future authors, and even instructors of the GEOG 0261 course, can take advantage of the code and data that I will make publicly accessible as part of this reproduction.

Integrity Statement

I followed and accomplished what I set out to do in my preanalysis plan. I did make a few slight deviations to improve the code and the interpretability of my report/findings: - To isolate the mobile homes in the flood risk polygons, instead of selecting by location (1 step), I ended up using a spatial join and then filtering based on mobile homes that were in the flood risk layer(2 steps) because of a warning message I got with the 1 step approach. The 1 step approach took much longer to run. - I ended up creating a polygon layer that is the difference between FEMA 100yr Flood Zones and VT River Corridor polygons, to show where the two methods of spatially identifying flood risk do not line up. I then plotted this in an interactive tmap with a satellite basemap so that the reader can play around with it to see where they don’t align.

I - the author of this reproduction study - state that I completed what I outlined in my preregistration to the best of my knowledge and that no other preregistration exists pertaining to the same hypotheses and research.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Baker, D., Hamshaw, S. D., & Hamshaw, K. A. (2014). Rapid flood exposure assessment of Vermont mobile home parks following Tropical Storm Irene. Natural Hazards Review, 15(1), 27-37. DOI: 10.1061/(ASCE)NH.1527-6996.0000112.

Schmitt, R. R. (1978). Threats to validity involving geographic space. Socio-Economic Planning Sciences, 12(4), 191–195. https://doi.org/10.1016/0038-0121(78)90044-7